# Hierarchical Vision Transformer
Hiera Abswin Base Mim
Apache-2.0
A Hiera image encoder employing an absolute window position embedding strategy, pre-trained via Masked Image Modeling (MIM), serving as a general-purpose feature extractor or backbone network for downstream tasks.
Image Classification
H
birder-project
72
0
Hiera Huge 224 Hf
Hiera is an efficient hierarchical vision Transformer model that excels in image and video tasks with fast runtime
Image Classification
Transformers English

H
facebook
41
1
Hiera Large 224 Hf
Hiera is a hierarchical vision Transformer model that is fast, powerful, and concise, surpassing existing technologies in image and video tasks while being faster.
Image Classification
Transformers English

H
facebook
532
1
Hiera Base Plus 224 Hf
Hiera is a hierarchical vision Transformer model that is fast, powerful, and concise, surpassing state-of-the-art performance in a wide range of image and video tasks while significantly improving runtime speed.
Image Classification
Transformers English

H
facebook
15
0
Hiera Base 224 Hf
Hiera is a hierarchical vision Transformer model that is fast, powerful, and concise, excelling in image and video tasks.
Image Classification
Transformers English

H
facebook
163
0
Hiera Base 224 In1k Hf
Hiera is a hierarchical vision Transformer model that is fast, powerful, and concise. It surpasses state-of-the-art performance in a wide range of image and video tasks while significantly improving runtime speed.
Image Classification
Transformers English

H
facebook
188
2
Hiera Small 224 Hf
Hiera is a hierarchical vision Transformer model that combines speed, powerful functionality, and minimalist design, significantly surpassing existing technical standards in image and video tasks with outstanding computational efficiency.
Image Classification
Transformers English

H
facebook
23
0
Hiera Tiny 224 Hf
Hiera is a hierarchical vision Transformer model that is fast, powerful, and extremely concise. It surpasses current state-of-the-art techniques in a wide range of image and video tasks while achieving significant speed improvements.
Image Classification
Transformers English

H
facebook
8,208
0
Upernet Swin Large
MIT
UperNet is a framework for semantic segmentation, combining the Swin Transformer backbone to achieve pixel-level scene understanding
Image Segmentation
Transformers English

U
openmmlab
3,251
0
Nat Small In1k 224
MIT
NAT-Small is a hierarchical vision transformer based on neighborhood attention, designed for image classification tasks.
Image Classification
Transformers Other

N
shi-labs
6
0
Dinat Mini In1k 224
MIT
DiNAT-Mini is a hierarchical vision Transformer model based on neighborhood attention mechanism, specifically designed for image classification tasks.
Image Classification
Transformers

D
shi-labs
462
1
Swinv2 Large Patch4 Window12to24 192to384 22kto1k Ft
Apache-2.0
Swin Transformer v2 is a vision Transformer model pre-trained on ImageNet-21k and fine-tuned on ImageNet-1k at 384x384 resolution, featuring hierarchical feature maps and local window self-attention mechanisms.
Image Classification
Transformers

S
microsoft
3,048
4
Swinv2 Large Patch4 Window12to16 192to256 22kto1k Ft
Apache-2.0
Swin Transformer v2 is a vision Transformer model that achieves efficient image classification and dense recognition tasks through hierarchical feature maps and local window self-attention mechanisms.
Image Classification
Transformers

S
microsoft
812
4
Swinv2 Base Patch4 Window12to24 192to384 22kto1k Ft
Apache-2.0
Swin Transformer v2 is a vision transformer model that achieves efficient image classification and dense recognition tasks through hierarchical feature maps and local window self-attention mechanisms.
Image Classification
Transformers

S
microsoft
1,824
0
Swinv2 Base Patch4 Window12to16 192to256 22kto1k Ft
Apache-2.0
Swin Transformer v2 is a vision Transformer model that achieves efficient image classification through hierarchical feature maps and local window-based self-attention mechanisms.
Image Classification
Transformers

S
microsoft
459
1
Swinv2 Base Patch4 Window12 192 22k
Apache-2.0
Swin Transformer v2 is a vision Transformer model that achieves efficient image processing through hierarchical feature maps and local window self-attention mechanisms.
Image Classification
Transformers

S
microsoft
8,603
3
Swinv2 Base Patch4 Window16 256
Apache-2.0
Swin Transformer v2 is a vision Transformer model that achieves efficient image classification and dense recognition tasks through hierarchical feature maps and local window self-attention mechanisms.
Image Classification
Transformers

S
microsoft
1,853
3
Swinv2 Base Patch4 Window8 256
Apache-2.0
Swin Transformer v2 is a vision Transformer model that achieves efficient image classification and dense recognition tasks through hierarchical feature maps and local window self-attention mechanisms.
Image Classification
Transformers

S
microsoft
16.61k
7
Swinv2 Small Patch4 Window16 256
Apache-2.0
Swin Transformer v2 is a vision Transformer model that achieves efficient image processing through hierarchical feature maps and local window self-attention mechanisms.
Image Classification
Transformers

S
microsoft
315
1
Swinv2 Tiny Patch4 Window16 256
Apache-2.0
Swin Transformer v2 is a vision Transformer model that achieves efficient image classification through hierarchical feature maps and local window self-attention mechanisms.
Image Classification
Transformers

S
microsoft
403.69k
5
Swinv2 Tiny Patch4 Window8 256
Apache-2.0
Swin Transformer v2 is a vision Transformer model pre-trained on ImageNet-1k, featuring hierarchical feature maps and local window self-attention mechanisms with linear computational complexity.
Image Classification
Transformers

S
microsoft
25.04k
10
Swin Large Patch4 Window12 384
Apache-2.0
Swin Transformer is a hierarchical vision Transformer model based on shifted windows, specifically designed for image classification tasks.
Image Classification
Transformers

S
microsoft
22.77k
1
Swin Base Patch4 Window7 224 In22k
Apache-2.0
Swin Transformer is a hierarchical window-based vision Transformer model pretrained on the ImageNet-21k dataset, suitable for image classification tasks.
Image Classification
Transformers

S
microsoft
13.30k
15
Swin Large Patch4 Window12 384 In22k
Apache-2.0
Swin Transformer is a hierarchical window-based vision Transformer model, pretrained on the ImageNet-21k dataset, suitable for image classification tasks.
Image Classification
Transformers

S
microsoft
1,063
7
Swin Base Patch4 Window12 384 In22k
Apache-2.0
Swin Transformer is a hierarchical vision Transformer based on shifted windows, specifically designed for image classification tasks.
Image Classification
Transformers

S
microsoft
2,431
1
Swin Small Patch4 Window7 224
Apache-2.0
Swin Transformer is a hierarchical window-based vision Transformer model designed for image classification tasks, with computational complexity linearly related to input image size.
Image Classification
Transformers

S
microsoft
2,028
1
Swin Tiny Patch4 Window7 224
Apache-2.0
Swin Transformer is a hierarchical vision Transformer that achieves linear computational complexity by computing self-attention within local windows, making it suitable for image classification tasks.
Image Classification
Transformers

S
microsoft
98.00k
42
Swin Base Patch4 Window7 224
Apache-2.0
Swin Transformer is a hierarchical vision transformer based on shifted windows, suitable for image classification tasks.
Image Classification
Transformers

S
microsoft
281.49k
15
Swin Large Patch4 Window7 224
Apache-2.0
Swin Transformer is a hierarchical vision Transformer that achieves linear computational complexity by computing self-attention within local windows, making it suitable for image classification and dense recognition tasks.
Image Classification
Transformers

S
microsoft
2,079
1
Swin Large Patch4 Window7 224 In22k
Apache-2.0
Swin Transformer is a hierarchical vision transformer based on shifted windows, pretrained on the ImageNet-21k dataset, suitable for image classification tasks.
Image Classification
Transformers

S
microsoft
387
2
Swin Base Patch4 Window12 384
Apache-2.0
Swin Transformer is a hierarchical vision transformer based on shifted windows, specifically designed for image classification tasks, with computational complexity linear to input image size.
Image Classification
Transformers

S
microsoft
1,421
4
Featured Recommended AI Models